Automatic Text Segmentation for Movie Subtitles

نویسندگان

  • Martin Scaiano
  • Diana Inkpen
  • Robert Laganière
  • Adele Reinhartz
چکیده

To improve information retrieval from films we attempt to segment movies into scenes using the subtitles. Film subtitles differ significantly in nature from other texts; we describe some of the challenges of working with movie subtitles. We test a few modifications to the TextTiling algorithm, in order to get an effective segmentation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles

We present a new major release of the OpenSubtitles collection of parallel corpora. The release is compiled from a large database of movie and TV subtitles and includes a total of 1689 bitexts spanning 2.6 billion sentences across 60 languages. The release also incorporates a number of enhancements in the preprocessing and alignment of the subtitles, such as the automatic correction of OCR erro...

متن کامل

Synchronizing Translated Movie Subtitles

This paper addresses the problem of synchronizing movie subtitles, which is necessary to improve alignment quality when building a parallel corpus out of translated subtitles. In particular, synchronization is done on the basis of aligned anchor points. Previous studies have shown that cognate filters are useful for the identification of such points. However, this restricts the approach to rela...

متن کامل

Automatic turn segmentation for Movie & TV subtitles

Movie and TV subtitles contain large amounts of conversational material, but lack an explicit turn structure. This paper present a data-driven approach to the segmentation of subtitles into dialogue turns. Training data is first extracted by aligning subtitles with transcripts in order to obtain speaker labels. This data is then used to build a classifier whose task is to determine whether two ...

متن کامل

Bi-text Alignment of Movie Subtitles for Spoken English-Arabic Statistical Machine Translation

We describe efforts towards getting better resources for EnglishArabic machine translation of spoken text. In particular, we look at movie subtitles as a unique, rich resource, as subtitles in one language often get translated into other languages. Movie subtitles are not new as a resource and have been explored in previous research; however, here we create a much larger bi-text (the biggest to...

متن کامل

Impact of Automatic Segmentation on the Quality, Productivity and Self-reported Post-editing Effort of Intralingual Subtitles

This paper describes the evaluation methodology followed to measure the impact of using a machine learning algorithm to automatically segment intralingual subtitles. The segmentation quality, productivity and self-reported post-editing effort achieved with such approach are shown to improve those obtained by the technique based in counting characters, mainly employed for automatic subtitle segm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010